https://www.r-bloggers.com/rvest-easy-web-scraping-with-r/

rvest




install.packages("rvest")

library(rvest)

lego_movie <- html("http://www.imdb.com/title/tt1490017/")


We use html_node() to find the first node
extract its contents with html_text()
convert it to numeric with as.numeric():

lego_movie %>%
html_node("strong span") %>%
html_text() %>%
as.numeric()

find all nodes that match the selector:

lego_movie %>%
html_nodes("#titleCast .itemprop span") %>%
html_text()

use html_node() and [[ to find it, then coerce it to a data frame with html_table():

lego_movie %>%
html_nodes("table") %>%
.[[3]] %>%
html_table()

Other important functions

If you prefer, you can use xpath selectors instead of css:
html_nodes(doc, xpath = "//table//td")).

Extract the tag names with html_tag(),
text with html_text(),
a single attribute with html_attr()
or all attributes with html_attrs().

Detect and repair text encoding problems with
guess_encoding()
and repair_encoding()

Navigate around a website as if you’re in a browser with
html_session(),
jump_to(),
follow_link(),
back(),
and forward().

Extract, modify and submit forms with
html_form(),
set_values() and
submit_form()

To see these functions in action, check out package demos with
demo(package = "rvest").






Functions in rvest

Name Description
google_form Make link to google form given id
html_text Extract attributes, text and tag name from html.
html_form Parse forms in a page.
html_tag html_tag
html_table Parse an html table into a data frame.
encoding Guess and repair faulty character encoding.
jump_to Navigate to a new url.
html Parse an HTML page.
html_nodes Select nodes from an HTML document
html_session Simulate a session in an html browser.
minimal_html Generate a minimal html5 page.
session_history History navigation tools
xml Work with xml.
%>% Pipe operator
submit_form Submit a form back to the server.
set_values Set values in a form.
pluck Extract elements of a list by position.
No Results!